BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

ssomers · 2020-01-22T19:32:00Z

#68418 concluded that it's not desirable to tag all internal functions with preconditions as being unsafe. This PR does it to some functions, documents why, and elsewhere enforces the preconditions with asserts.

ssomers · 2020-01-22T19:57:31Z

Benchmark comparison should be the same as in #68418, I suppose - just some more unsafe tags and comments. These do seem to shift the focus of the optimizer somewhat away from set::is_subset, regardless of the usual noise.

>cargo benchcmp mastr2.txt middle2.txt --threshold 3
 name                                           mastr2.txt ns/iter  middle2.txt ns/iter  diff ns/iter  diff %  speedup
 btree::map::find_rand_100                      18                  17                             -1  -5.56%   x 1.06
 btree::map::find_rand_10_000                   58                  56                             -2  -3.45%   x 1.04
 btree::map::insert_seq_100                     45                  47                              2   4.44%   x 0.96
 btree::map::iter_1000                          2,641               2,880                         239   9.05%   x 0.92
 btree::map::iter_mut_1000                      2,705               2,823                         118   4.36%   x 0.96
 btree::set::build_and_pop_all                  6,065               5,828                        -237  -3.91%   x 1.04
 btree::set::difference_staggered_100_vs_100    855                 795                           -60  -7.02%   x 1.08
 btree::set::difference_staggered_100_vs_10k    2,405               2,311                         -94  -3.91%   x 1.04
 btree::set::difference_staggered_10k_vs_10k    78,083              75,095                     -2,988  -3.83%   x 1.04
 btree::set::intersection_100_neg_vs_100_pos    25                  24                             -1  -4.00%   x 1.04
 btree::set::intersection_10k_neg_vs_10k_pos    30                  31                              1   3.33%   x 0.97
 btree::set::intersection_random_100_vs_100     591                 646                            55   9.31%   x 0.91
 btree::set::intersection_random_100_vs_10k     2,251               2,361                         110   4.89%   x 0.95
 btree::set::intersection_random_10k_vs_100     2,276               2,362                          86   3.78%   x 0.96
 btree::set::intersection_staggered_100_vs_100  652                 717                            65   9.97%   x 0.91
 btree::set::intersection_staggered_100_vs_10k  2,075               2,201                         126   6.07%   x 0.94
 btree::set::intersection_staggered_10k_vs_10k  63,366              67,628                      4,262   6.73%   x 0.94
 btree::set::is_subset_100_vs_100               426                 459                            33   7.75%   x 0.93
 btree::set::is_subset_100_vs_10k               1,363               1,458                          95   6.97%   x 0.93
 btree::set::is_subset_10k_vs_10k               40,384              45,252                      4,868  12.05%   x 0.89

src/liballoc/collections/btree/node.rs

Mark-Simulacrum

Looks great! Left a few comments.

Mark-Simulacrum · 2020-01-22T23:38:13Z

src/liballoc/collections/btree/node.rs

@@ -369,7 +369,7 @@ impl<BorrowType, K, V, Type> NodeRef<BorrowType, K, V, Type> {
    /// If the node is a leaf, this function simply opens up its data.
    /// If the node is an internal node, so not a leaf, it does have all the data a leaf has
    /// (header, keys and values), and this function exposes that.
-    /// See `NodeRef` on why the node may not be a shared root.
+    /// Unsafe because the node may not be a shared root, see why in the `NodeRef` comments.


Suggested change

/// Unsafe because the node may not be a shared root, see why in the `NodeRef` comments.

/// The node must not be a shared root. For more information, see the `NodeRef` comments.

Hard to believe I fell in the old must-not/may-not trap (a Dutch thing). But now, is it clear why the function is unsafe? I edited the age-old comments on new_kv and new_edge just to get that across.

src/liballoc/collections/btree/map.rs

src/liballoc/collections/btree/node.rs

Mark-Simulacrum · 2020-01-22T23:54:30Z

src/liballoc/collections/btree/node.rs

@@ -647,7 +637,7 @@ impl<'a, K, V> NodeRef<marker::Mut<'a>, K, V, marker::Internal> {

    fn correct_childrens_parent_links(&mut self, first: usize, after_last: usize) {


This should be annotated, I think, as the first/after_last must be <= self.len() I believe.

(Or we can assert that condition).

Well, we can't simply assert it like that. split_off invokes move_suffix which invokes move_edges with a fixed dest_offset of 1, which is out of bounds if right_new_len is zero. Nothing is moved, no children's parents are corrected.

One may condemn split_off for investing time to move an empty suffix but I'm opting for a stranger debug_assert right now.

Changed my mind, added an if-test on right_new_len

Mark-Simulacrum · 2020-01-23T00:15:01Z

I also resolved your comments, as I agree that neither need be addressed here.

Mark-Simulacrum

Hopefully a final round and then we can get this landed.

Mark-Simulacrum · 2020-01-24T01:17:20Z

src/liballoc/collections/btree/node.rs

@@ -240,7 +240,7 @@ impl<K, V> Root<K, V> {
    /// Adds a new internal node with a single edge, pointing to the previous root, and make that
    /// new node the root. This increases the height by 1 and is the opposite of `pop_level`.
    pub fn push_level(&mut self) -> NodeRef<marker::Mut<'_>, K, V, marker::Internal> {
-        debug_assert!(!self.is_shared_root());
+        debug_assert!(!self.is_shared_root()); // &mut to a static is impossible anyway


I would like to see us drop the debug asserts that should be statically impossible (e.g. like here, where we have unique access).

So actually, not committing that (yet). Seems I should retract the comment instead. And perhaps mark unsafe. Or do you imply that &mut self logically implies that the caller should know that the shared root is unacceptable?

Yes, we should retract this. I missed that we don't actually "own" the root here necessarily (i.e., we may have a NonNull or perhaps not)

Mark-Simulacrum · 2020-01-24T01:18:58Z

src/liballoc/collections/btree/node.rs

-    pub fn new_kv(node: NodeRef<BorrowType, K, V, NodeType>, idx: usize) -> Self {
-        // Necessary for correctness, but in a private module
+    /// Creates a new handle to a key/value pair in `node`.
+    /// Unsafe because the caller must ensure that `idx` be less than `node.len()`.


Suggested change

/// Unsafe because the caller must ensure that `idx` be less than `node.len()`.

/// Unsafe because the caller must ensure that `idx < node.len()`.

/// Note that `node.len()` is always safe to call.

Please verify that the second line is actually true.

I'm not sure that node.len() is safe. Everything I learnt over the past months tells me that it reads through a pointer that is always well aligned from memory that is always allocated, storing the value in a register or stack exclusive to the caller. But in the course of #58431, @RalfJung wrote in into_slices_mut that len wreaks havoc on things elsewhere and had to be avoided.

Hm, it is true that under stacked borrows a call to len() can invalidate some other reference, but that doesn't really make it unsafe?

I'm reading up on stacked borrows, and two hours later I might have understood something, i.e. what you're asking. I guess invalidating is a lot more subtle than killing. Still bad enough that it prompted the rewrite of into_slices_mut, but that was only to satisfy Miri and to avoid future trouble - there was no genuine UB. "Verify" means to check that Miri is happy executing the expressions in the debug_assert (while it normally skips those).

If that's right, then yes, the second line is actually true.

Also applied that comment style to new_edge

Calling len has no preconditon other than the argument being well-typed

Sure, but I was thinking of a precondition (or restriction, if precondition is the wrong term) on the unsafe new_kv function. Though it does seem that if you start documenting that, you'd have to do that in a lot of functions for a lot of "interactions" with memory.

It's not a precondition, I'd say it's part of the side-effects of this function. It's part of the side-effect of all &self functions that calling it kills exclusive aliases to self. This one here is special only insofar as the NodeRef type is used instead of a "normal" reference.

Right, I understand. "Note that node.len() is always safe to call" does not go into what side effects there might be.

My answer to "that the second line is actually true" is simply: yes, NodeRef::len() is not marked unsafe.

And Mark's point is that it is merely a hint to the caller, not a justification of the debug_assert, so there's not real reason to keep it nor to remove it.

Yeah, let's drop it from here perhaps and make an explicit note on NodeRef::len()? Unfortunately given the unsafety landscape today the "is not unsafe" part of NodeRef::len() doesn't really actually play out to being necessarily indicative of lacking preconditions :)

I dropped it and recycled the bytes to somewhat wishy-washy comments in NodeRef::len(). And not on NodeRef::height() or any other &self method...

src/liballoc/collections/btree/node.rs

Mark-Simulacrum · 2020-01-24T01:21:06Z

Please also squash the commits into one while addressing the latest feedback.

ssomers

I transplanted the earlier demo into a commit on top of the current commits here, which gives branch btreemap_prefer_middle_demo. Testing btree::map::test_basic_small clearly demonstrates to me that there is no protection against a &mut self to the shared root, apart from the debug_assert inside into_key_slice_mut. Even Miri doesn't object (though of course a little later Miri does object to transmuting the shared root to a leaf pointer).

Perhaps what you based on is that the node field of a NodeRef cannot be turned into a '&mut` on its own?

Mark-Simulacrum · 2020-01-24T12:51:24Z

Okay, I think this is good to go modulo the &mut to a static is impossible comments, sorry for misleading you there. (Well, it's true, but that's not what those method signatures mean, due to indirection through a raw pointer).

ssomers · 2020-01-24T14:02:30Z

I've become quite fond of the idea that marker::Mut should imply exclusivity. But alas, pretending to have a marker::Mut reference to the shared root is exactly what get_mut and remove do. search_node returns such a marker::Mut edge handle into the shared root hinting where the desired key could be inserted. Other mutable operations on BTreeMap don't face this because they swap out the shared root for a real root up front, but that would obviously be wasteful for get_mut and remove.

ssomers · 2020-01-24T14:07:52Z

So really what I just wrote is that search_node should be made unsafe, because it can return a baloney SearchResult::GoDown result.

ssomers · 2020-01-28T19:22:56Z

search_node should be made unsafe, because it can return a baloney SearchResult::GoDown result.

I tried split up the search code such that it never returns baloney handles, but it's not the only place where this happens: even the Drop implementation constructs a RangeMut with equal front and back edge handles, which point into the shared root.

So such handles are not baloney but unavoidable, and marker::Mut cannot imply exclusivity.

…itions Co-Authored-By: Mark Rousskov <mark.simulacrum@gmail.com>

ssomers · 2020-01-29T15:13:50Z

So such handles are not baloney but unavoidable, and marker::Mut cannot imply exclusivity.

I couldn't let it be, could I... Turns out it's relatively easy to get rid of marker::Mut/Owned NodeRefs to shared roots altogether. But I'm not planning to include any of that in this PR.

Mark-Simulacrum · 2020-01-29T15:38:06Z

@bors r+

Let's land this so we can make some progress, and we can continue iterating in future PRs if necessary.

bors · 2020-01-29T15:38:08Z

📌 Commit ba87a50 has been approved by Mark-Simulacrum

…ark-Simulacrum BTreeMap: tag and explain unsafe internal functions or assert preconditions rust-lang#68418 concluded that it's not desirable to tag all internal functions with preconditions as being unsafe. This PR does it to some functions, documents why, and elsewhere enforces the preconditions with asserts.

@ghost

Rollup of 6 pull requests Successful merges: - #66648 (Implement clone_from for BTreeMap and BTreeSet) - #68468 (BTreeMap: tag and explain unsafe internal functions or assert preconditions) - #68626 (Use termize instead of term_size) - #68640 (Document remaining undocumented `From` implementations for IPs) - #68651 (Document `From` implementation for NonZero nums) - #68655 (Fix revision annotations in borrowck-feature-nll-overrides-migrate) Failed merges: r? @ghost

bors · 2020-01-30T05:37:31Z

⌛ Testing commit ba87a50 with merge 3024c4e...

rust-highfive assigned Mark-Simulacrum Jan 22, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 22, 2020

ssomers commented Jan 22, 2020

View reviewed changes

src/liballoc/collections/btree/node.rs Show resolved Hide resolved

ssomers commented Jan 22, 2020

View reviewed changes

src/liballoc/collections/btree/node.rs Show resolved Hide resolved

Mark-Simulacrum reviewed Jan 23, 2020

View reviewed changes

Mark-Simulacrum reviewed Jan 24, 2020

View reviewed changes

ssomers commented Jan 24, 2020

View reviewed changes

ssomers force-pushed the btreemap_prefer_middle branch 2 times, most recently from ee8f7a2 to 5aecf55 Compare January 24, 2020 17:48

BTreeMap: tag and explain unsafe internal functions or assert precond…

ba87a50

…itions Co-Authored-By: Mark Rousskov <mark.simulacrum@gmail.com>

ssomers force-pushed the btreemap_prefer_middle branch from 5aecf55 to ba87a50 Compare January 29, 2020 10:54

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 29, 2020

Dylan-DPC-zz mentioned this pull request Jan 30, 2020

Rollup of 6 pull requests #68659

Merged

bors merged commit ba87a50 into rust-lang:master Jan 30, 2020

ssomers deleted the btreemap_prefer_middle branch January 30, 2020 08:59

ssomers mentioned this pull request Feb 16, 2020

BTree: lighten the load on Miri #68781

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

ssomers commented Jan 22, 2020 •

edited by Mark-Simulacrum

Loading

ssomers commented Jan 22, 2020

Mark-Simulacrum left a comment

Mark-Simulacrum Jan 22, 2020

ssomers Jan 23, 2020 •

edited

Loading

Mark-Simulacrum Jan 22, 2020

ssomers Jan 23, 2020

ssomers Jan 23, 2020

Mark-Simulacrum commented Jan 23, 2020

Mark-Simulacrum left a comment

Mark-Simulacrum Jan 24, 2020

ssomers Jan 24, 2020

ssomers Jan 24, 2020

Mark-Simulacrum Jan 24, 2020

Mark-Simulacrum Jan 24, 2020

ssomers Jan 24, 2020

Mark-Simulacrum Jan 24, 2020

ssomers Jan 24, 2020

ssomers Jan 24, 2020

ssomers Jan 28, 2020

RalfJung Jan 28, 2020

ssomers Jan 28, 2020

Mark-Simulacrum Jan 28, 2020

ssomers Jan 29, 2020

Mark-Simulacrum commented Jan 24, 2020

ssomers left a comment

Mark-Simulacrum commented Jan 24, 2020

ssomers commented Jan 24, 2020

ssomers commented Jan 24, 2020

ssomers commented Jan 28, 2020

ssomers commented Jan 29, 2020 •

edited

Loading

Mark-Simulacrum commented Jan 29, 2020

bors commented Jan 29, 2020

bors commented Jan 30, 2020

	/// Unsafe because the node may not be a shared root, see why in the `NodeRef` comments.
	/// The node must not be a shared root. For more information, see the `NodeRef` comments.

		@@ -647,7 +637,7 @@ impl<'a, K, V> NodeRef<marker::Mut<'a>, K, V, marker::Internal> {

		fn correct_childrens_parent_links(&mut self, first: usize, after_last: usize) {

	/// Unsafe because the caller must ensure that `idx` be less than `node.len()`.
	/// Unsafe because the caller must ensure that `idx < node.len()`.
	/// Note that `node.len()` is always safe to call.

BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

Conversation

ssomers commented Jan 22, 2020 • edited by Mark-Simulacrum Loading

ssomers commented Jan 22, 2020

Mark-Simulacrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Jan 23, 2020

Mark-Simulacrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Jan 24, 2020

ssomers left a comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Jan 24, 2020

ssomers commented Jan 24, 2020

ssomers commented Jan 24, 2020

ssomers commented Jan 28, 2020

ssomers commented Jan 29, 2020 • edited Loading

Mark-Simulacrum commented Jan 29, 2020

bors commented Jan 29, 2020

bors commented Jan 30, 2020

ssomers commented Jan 22, 2020 •

edited by Mark-Simulacrum

Loading

ssomers Jan 23, 2020 •

edited

Loading

ssomers commented Jan 29, 2020 •

edited

Loading